skip to main content


Search for: All records

Creators/Authors contains: "Thomas, James"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The stochastic modeling and calibration of an anisotropic elasto-plastic model for additive manufacturing materials are addressed in this work. We specifically focus on 316L stainless steel, produced by directed energy deposition. Tensile specimens machined from two additive manufactured (AM) box-structures were used to characterize material anisotropy and random spatial variations in elasticity and plasticity material parameters. Tensile specimens were cut parallel (horizontal) and perpendicular (vertical) to the AM deposition plane and were indexed by location. These results show substantial variability in both regimes, with fluctuation levels that differ between specimens loaded in the parallel and perpendicular build directions. Stochastic representations for the stiffness and Hill’s criterion coefficients random fields are presented next. Information-theoretic models are derived within the class of translation random fields, with the aim of promoting identifiability with limited data. The approach allows for the constitutive models to be generated on arbitrary geometries, using the so- called stochastic partial differential approach (to sampling). These representations are then partially calibrated using the aforementioned experimental results, hence enabling subsequent propagation analyses. Sampling is finally exemplified on the considered structure. 
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications, such as machine learning (ML), image processing, and computer vision. It is designed using an agile accelerator-compiler co-design flow; the compiler updates automatically with hardware changes, enabling continuous application-level evaluation of the hardware-software system. To increase hardware utilization and minimize reconfigurability overhead, Amber features the following: 1) dynamic partial reconfiguration (DPR) of the CGRA for higher resource utilization by allowing fast switching between applications and partitioning resources between simultaneous applications; 2) streaming memory controllers supporting affine access patterns for efficient mapping of dense linear algebra; and 3) low-overhead transcendental and complex arithmetic operations. The physical design of Amber features a unique clock distribution method and timing methodology to efficiently layout its hierarchical and tile-based design. Amber achieves a peak energy efficiency of 538 INT16 GOPS/W and 483 BFloat16 GFLOPS/W. Compared with a CPU, a GPU, and a field-programmable gate array (FPGA), Amber has up to 3902x, 152x, and 107x better energy-delay product (EDP), respectively. 
    more » « less
    Free, publicly-accessible full text available March 1, 2025
  3. null (Ed.)
    We present Fleet, a framework that offers a massively parallel streaming model for FPGAs and is effective in a number of domains well-suited for FPGA acceleration, including parsing, compression, and machine learning. Fleet requires the user to specify RTL for a processing unit that serially processes every input token in a stream, a far simpler task than writing a parallel processing unit. It then takes the user’s processing unit and generates a hardware design with many copies of the unit as well as memory controllers to feed the units with separate streams and drain their outputs. Fleet includes a Chisel-based processing unit language. The language maintains Chisel’s low-level performance control while adding a few productivity features, including automatic handling of ready-valid signaling and a native and automatically pipelined BRAM type. We evaluate Fleet on six different applications, including JSON parsing and integer compression, fitting hundreds of Fleet processing units on the Amazon F1 FPGA and outperforming CPU implementations by over 400× and GPU implementations by over 9× in performance per watt while requiring a similar number of lines of code. 
    more » « less
  4. null (Ed.)
    Abstract Objective Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports. Materials and Methods Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elements and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies. Results As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrations from the International Clinical Trials Registry Platform. The median trial sample size was 66. Conclusions We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (https://trialstreamer.robotreviewer.net). 
    more » « less
  5. Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN by applying graph transformations manually designed by human experts. This approach misses possible graph optimizations and is difficult to scale, as new DNN operators are introduced on a regular basis. We propose TASO, the first DNN computation graph optimizer that automatically generates graph substitutions. TASO takes as input a list of operator specifications and generates candidate substitutions using the given operators as basic building blocks. All generated substitutions are formally verified against the operator specifications using an automated theorem prover. To optimize a given DNN computation graph, TASO performs a cost-based backtracking search, applying the substitutions to find an optimized graph, which can be directly used by existing DNN frameworks. Our evaluation on five real-world DNN architectures shows that TASO outperforms existing DNN frameworks by up to 2.8X, while requiring significantly less human effort. For example, TensorFlow currently contains approximately 53,000 lines of manual optimization rules, while the operator specifications needed by TASO are only 1,400 lines of code. 
    more » « less
  6. Existing deep learning frameworks optimize the computation graph of a DNN model by performing greedy rule-based graph transformations, which generally only consider transformations that strictly improve runtime performance. We propose relaxed graph substitutions that enable the exploration of complex graph optimizations by relaxing the strict performance improvement constraint, which greatly increases the space of semantically equiv- alent computation graphs that can be discovered by repeated application of a suitable set of graph transformations. We introduce a backtracking search algorithm over a set of relaxed graph substitutions to find optimized networks and use a flow-based graph split algorithm to recursively split a computation graph into smaller subgraphs to allow efficient search. We implement relaxed graph substitutions in a system called MetaFlow and show that MetaFlow improves the inference and training performance by 1.1-1.6× and 1.1-1.2× respectively over existing deep learning frameworks. 
    more » « less